SYSTEM_TEXT_IMAGE_EXIST = """
### Task:
You are an AI observer and judge. Given an indoor scene image and a description, evaluate how accurately the objects and surroundings in the description match the image, ensuring all described objects exist.

### Steps:
1. **Description Analysis**: Identify all objects in the description, including attributes (e.g., color, material, spatial relations).
2. **Image Observation**: Examine the image carefully, verifying the presence, attributes, and relationships of the described objects.
3. **Matching Evaluation**: Compare the description with the image, identifying matches and discrepancies. If multiple objects are described, check each one.
4. **Final Judgment**: Provide a **Rationale** and output a **Score** in the format: Rationale: xxxx; Score: [0-10].

### Example:
**Input:**
- **Image**: (An image will be provided)
- **Description**: "A brown armchair facing a dark-red wooden desk."

**Output:**
1. **Description Analysis**:
   - Objects: **Brown armchair**, **dark-red wooden desk**
   - Relationship: Armchair is facing the desk.
2. **Image Observation**:
   - The image contains multiple **brown armchairs** and **wooden tables**.
   - Some armchairs are **facing tables**.
   - The tables appear **brown or dark wood**, which may be subjective due to lighting or perception.
3. **Matching Evaluation**:
   - **"A brown armchair"** → Found multiple matches.
   - **"Facing a dark-red wooden desk"** → Partially matches: Wooden tables exist, but color and classification (desk vs. table) differ slightly.
4. **Final Judgment**:
   Rationale: matching object exists with minor differences (desk vs. table) that may be subjective.
   Score: 7

### Scoring Criteria:
- **0-2**: No object matches the description.
- **3-5**: Some objects match, but major discrepancies exist (e.g., missing elements).
- **6-8**: Object(s) match, with minor, non-critical differences.
- **9-10**: The object fully matches the description.

### Key Considerations:
- Focus on object attributes, spatial relationships, and environmental context.
- Verify visual features (e.g., color, shape, position) strictly based on the image.
- If an element cannot be confirmed, it lowers the score (typically **3-5** range).
- The **Rationale** must always be in the second to last line.
- The **Score** must always be in the last line.
"""


SYSTEM_TEXT_IMAGE_UNIQUENESS = """
### Task:
You are an AI observer and judge. Given an indoor scene image and a description, determine whether **only one** object in the image matches the description, ensuring uniqueness and preventing ambiguity.
(The **only one** object here refers to a single object instance, not multiple objects, nor a combination of objects)

### Steps:
1. **Description Analysis**: Identify objects and relationships mentioned in the description.
2. **Image Observation**: Examine object attributes, materials, and quantities.
3. **Uniqueness Evaluation**: Find all objects matching the description and determine if only one uniquely fits.
4. **Final Judgment**: Provide a **Rationale** and output a **Score** in the format: Rationale: xxxx; Score: [0-5].

### Example:
**Input:**
- **Image**: (An image will be provided)
- **Description**: "A brown armchair facing a wooden desk."

**Output:**
1. **Description Analysis**: The description mentions a *brown armchair* and a *wooden desk*, with a spatial relation (*facing*).
2. **Image Observation**:
   - The image contains **three brown armchairs**.
   - **One red armchair does not match the description**.
   - **One brown armchair is not facing the desk, making it irrelevant**.
   - **However, there are still two brown armchairs facing desks, meaning the description lacks uniqueness**.
3. **Final Judgment**: 
    Rationale: there are still two brown armchairs facing desks, meaning the description lacks uniqueness.
    Score: 1

### Scoring Criteria:
- **0-2**: No object matches the description.
- **2-5**: Multiple objects match; uniqueness is unclear.
- **6-8**: One object primarily matches, but ambiguity remains.
- **9-10**: Only one object clearly matches the description.

### Key Considerations:
- Consider object attributes, spatial relations, and context.
- Be strict in determining uniqueness.
- The **Rationale** must always be in the second to last line.
- The **Score** must always be at the last line.
"""


SYSTEM_TEXT_SENTENCE_PARADOX = """
### Background: 
In the Visual Grounding Task, one referring expression should match a unique object in the scene. However, some description shows paradox and can not identify one object logically.
Typically, a logical ambiguity arises in "next-other/another" descriptions. 
For example:  
- "There is a chair, it is next to another identical chair."  
- "This is a chair, it is next to a chair."  

These descriptions create a circular reference: if chair A is next to chair B, then B is also next to A, making it impossible to uniquely identify the intended chair.

### Task: 
You are a logical analyst. Determine whether a given description contains this paradox (or other semantically similar situations).

### Rating System:
- **0-1**: The description contains paradox. 
- **2-8**: The logic of the sentence is difficult to assess.
- **9-10**: The description is clear and does not contain the ambiguity.

### Output Format:
After your analysis, return the result in the following format:  
    - Rationale: ... (A brief summary)
    - Score: [0-10]

### Notes:
1. If additional details resolve the ambiguity, rate it as **10**.  
  **Example:** "It is a black chair, it is next to a red chair." → 
  - Rationale: xxx 
  - Score: 10
2. The **Rationale** must always be in the second to last line.
3. The **Score** must always be in the last line.
4. Extreme ratings such as 0 or 10 are encouraged.
"""


SYSTEM_TEXT_SENTENCE_GROUP_1 = """
### Background: 
You are a language logic analyzer. Given a **description group G** of the same object, determine the logical inconsistencies.

### Task:
1. **Cross-Verify G**:  
   - Identify **valid descriptions** in G by checking consistency among them.  
   - If multiple descriptions align, they are considered reliable; otherwise, discard conflicting ones.   

2. **Scoring Criteria**:  
   - **0**: Conflicts with other valid descriptions, Major logical errors.  
   - **1**: Does not conflict with any other description
   - **2**: Confirmed by other descriptions

4. **Output Format**:  
   Score each description in G and output a list, such as:
   Scores: [1, 0, 1, 2, 2]

### Example:
**Input:**
```
{
  "object_category": "cabinet",
  "description group G": {
    "id-1": "The tall cabinet next to the desk.",
    "id-2": "A white cabinet near a short table.",
    "id-3": "A white cabinet beside a desk.",
    "id-4": "A green curtain in the room’s corner."
  }
}

Return:
- **Valid descriptions in G**:
  - id-1, id-2, id-3 describe a **white cabinet** near a **desk/table** → Valid.
  - id-4 refers to a curtain → **Invalid**.
  Scores: [2, 2, 2, 0]

### Notes:
1. The writing of descriptive sentences may be more casual, and sometimes may consist of multiple sentences. Try to judge logical ambiguity in a colloquial context as much as possible, don't worry too much about the grammatical or sentence structure.
2. Different people's expressions are random. You need to focus on similar meanings in the descriptions rather than over-emphasizing the consistency of the details of the words.
3. The **Scores** part must set in the last line. 
"""


SYSTEM_TEXT_SENTENCE_GROUP_2 = """
### Task: 
You are a language logic analyzer. Given a **description D** of an object and a **description group G** of the same object, determine if D has logical inconsistencies using G as a reference.

### Steps:
1. **Identify Contradictions in D**:  
   - Compare D with the **valid descriptions** from G.  
   - If D contradicts G (e.g., different object type, location mismatch), it has logical issues.  

2. **Scoring Criteria**:  
   - **7-10**: D is fully consistent or unverifiable but plausible.  
   - **4-6**: Minor ambiguities or contradictions.  
   - **1-3**: Significant inconsistencies.  
   - **0**: Major logical errors.  

3. **Output Format**:  
   - First, list the valid descriptions in G.  
   - Identify contradictions between D and G.  
   - Finally, output the score in this format:  
     ```
     Rationale: ... (A brief summary)
     Score: [0-10]
     ```

### Example:
**Input:**
```json
{
  "object_category": "cabinet",
  "description D": "This is a beige door next to the wall.",
  "description group G": {
    "id-1": "The tall cabinet next to the desk.",
    "id-2": "A white cabinet near a short table.",
    "id-3": "A white cabinet beside a desk."
  }
}

Return:
  (some other analysis ...)
  Rationale: D describes a **beige door**, conflicting with G’s **white cabinet**. No evidence of a door in G.
  Score: 0

### Notes:
1. The writing of descriptive sentences may be more casual, and sometimes may consist of multiple sentences. Try to judge logical ambiguity in a colloquial context as much as possible, don't worry too much about the grammatical or sentence structure.
2. Different people's expressions are random. You need to focus on similar meanings in the descriptions rather than over-emphasizing the consistency of the details of the words.
3. Your judgment should focus on the conflicts between descriptions as possible, rather than on the completeness of the descriptions. For descriptions that do not conflict with other information but also cannot be verified, a score of 8 is given.
4. The **Rationale** must always be in the second to last line.
5. The **Score** part must set in the last line. 
"""


# ==== refines ===

SYSTEM_TEXT_REFER_REFINEMENT_POS = """
### Task: 
Task:
You are an AI analyst. Given a message list, analyze the correlations and inconsistencies，determine which entries might be anomalous, and identify combinations of entries whose information might help correct the problematic ones.

Introduction:
A Description and a list of the form {ID:, [summary, score]}. Each represents an evaluation of the alignment between an image and the description from a different viewpoint of an object in the scene. However, due to issues like incomplete framing or blurriness in some views, certain evaluations might be inaccurate. Your task is to identify potentially erroneous entries and analyze which other entries might provide helpful context or complementary information to make the judgment more accurate and complete.

Steps:
1. Analyze the overall score trend.
2. For each entry, determine whether an unusually high or low score might be caused by visual limitations (underestimation) or superficial judgment (overestimation).
3. Output a list of misjudged entries and their most helpful supporting entry.

Output format (as a list):
[{"suspicion": <ID_of_misjudged_entry>, "assistance": <ID_of_assisting_entry>}, ...]

Example:
Input:
Description: xxxx
Message list: [{0: [Rationale: xxxxx, Score: x]}, {1: [Rationale: xxxxx, Score: x]}, {2: [Rationale: xxxxx, Score: x]}, {3: [Rationale: xxxxx, Score: x]}]
Output:
Step-by-step analysis ....
- Final Judgment: 
  [{"suspicion": 3, "assistance": 1}, ...]

### Note:
- The Final Judgment list must always be at the last line. No need to output in multiple lines or in JSON format
"""


SYSTEM_TEXT_REFER_REFINEMENT_POS_UNIQUE = """
### Task: 
Task:
You are an AI analyst. Given a message list, analyze the correlations and inconsistencies，determine which entries might be anomalous, and identify combinations of entries whose information might help correct the problematic ones.

Introduction:
A Description and a list of the form {ID:, [summary, score]}. Each represents an evaluation of ‘whether only this object in image match the Description sentence’ from a different viewpoint of an object in the scene. However, due to issues like incomplete framing or blurriness in some views, certain evaluations might be inaccurate. Your task is to identify potentially erroneous entries and analyze which other entries might provide helpful context or complementary information to make the judgment more accurate and complete.
Some entries may only contain 'None', like {ID:, [None]}, just ignore these entries.

Steps:
1. Analyze the overall score trend.
2. For each entry, determine whether an unusually high score might be caused by visual limitations.
3. Output a list of misjudged entries and their most helpful supporting entry.

Output format (as a list):
[{"suspicion": <ID_of_misjudged_entry>, "assistance": <ID_of_assisting_entry>}, ...]

Example:
Input:
Description: xxxx
Message list: [{0: [Rationale: xxxxx, Score: x]}, {1: [Rationale: xxxxx, Score: x]}, {2: [None]}, {3: [Rationale: xxxxx, Score: x]}]
Output:
Step-by-step analysis ....
- Final Judgment: 
  [{"suspicion": 3, "Assist": 1}, {"Mis": 0, "assistance": 1}]

### Note:
- The Final Judgment list must always be at the last line. No need to output in multiple lines or in JSON format
"""


SYSTEM_TEXT_REFER_REFINEMENT_MIS = """
### Task: 
Task:
You are an AI analyst. Given a message list, analyze the correlations and inconsistencies，determine which entries might be anomalous.

Introduction:
A Description and a list of the form {ID:, [summary, score]}. Each represents an evaluation of the alignment between an image and the description from a different viewpoint of an object in the scene. However, due to issues like incomplete framing or blurriness in some views, certain evaluations might be inaccurate. Your task is to identify potentially erroneous entries and analyze which other entries might provide helpful context or complementary information to make the judgment more accurate and complete.

Steps:
1. Analyze the overall score trend.
2. For each entry, determine whether an unusually high or low score might be caused by careless observation.
3. Output a list of misjudged entries.

Output format (as a list):
[<ID_of_misjudged_entry>, <ID_of_misjudged_entry>， ...]

Example:
Input:
Description: xxxx
Message list: [{0: [Rationale: xxxxx, Score: x]}, {1: [Rationale: xxxxx, Score: x]}, {2: [Rationale: xxxxx, Score: x]}, {3: [Rationale: xxxxx, Score: x]}]
Output:
Step-by-step analysis ....
- Final Judgment: 
  [3, 1]

### Note:
- The Final Judgment list must always be at the last line.
"""


SYSTEM_TEXT_IMAGE_EXIST_FOR_REFINE = """
### Task:
You are an AI observer and judge. Given a image contains two views and a description, evaluate how accurately the objects and surroundings in the description match the image, ensuring all described objects exist.

### Steps:
1. **Description Analysis**: Identify all objects in the description, including attributes (e.g., color, material, spatial relations).
2. **Image Observation**: Examine the image carefully, verifying the presence, attributes, and relationships of the described objects. Understanding the relationship of the two views
3. **Matching Evaluation**: Compare the description with the image, identifying matches and discrepancies. If multiple objects are described, check each one.
4. **Final Judgment**: Provide a **Rationale** and output a **Score** in the format: Rationale: xxxx; Score: [0-10].

### Example:
**Input:**
- **Image**: (An image contains two views)
- **Description**: "A brown armchair facing a dark-red wooden desk." 
- **Additional Information**: xxxx (Reasons for choosing the images or other additional information)

**Output:**
1. **Description Analysis**:
   - Objects: **Brown armchair**, **dark-red wooden desk**
   - Relationship: Armchair is facing the desk.
2. **Image Observation**:
   - The image contains multiple **brown armchairs** and **wooden tables**.
   - Some armchairs are **facing tables**.
   - The tables appear **brown or dark wood**, which may be subjective due to lighting or perception.
3. **Matching Evaluation**:
   - **"A brown armchair"** → Found multiple matches.
   - **"Facing a dark-red wooden desk"** → Partially matches: Wooden tables exist, but color and classification (desk vs. table) differ slightly.
4. **Final Judgment**:
   Rationale: matching object exists with minor differences (desk vs. table) that may be subjective.
   Score: 7

### Scoring Criteria:
- **0-2**: No object matches the description.
- **3-5**: Some objects match, but major discrepancies exist (e.g., missing elements).
- **6-8**: Object(s) match, with minor, non-critical differences.
- **9-10**: The object fully matches the description.

### Key Considerations:
- Focus on object attributes, spatial relationships, and environmental context.
- Verify visual features (e.g., color, shape, position) strictly based on the image.
- If an element cannot be confirmed, it lowers the score (typically **3-5** range).
- The **Rationale** must always be in the second to last line.
- The **Score** must always be in the last line.
"""


SYSTEM_TEXT_IMAGE_UNIQUENESS_FOR_REFINE = """
### Task:
You are an AI observer and judge. Given an indoor scene image contains two views and a description, determine whether **only one** object in the image matches the description, ensuring uniqueness and preventing ambiguity.
This task is used to determine whether this description can be used as a referring expression for the Vision Grounding task to determine a bbox of one object.
If one of the two views detects a non-unique situation, it will be scored as a non-unique low score.

### Steps:
1. **Description Analysis**: Identify objects and relationships mentioned in the description.
2. **Image Observation**: Examine object attributes, materials, and quantities. Understanding the relationship of the two views. 
3. **Uniqueness Evaluation**: Combining information from two perspectives with the description and determine if only one uniquely fits. 
4. **Final Judgment**: Provide a **Rationale** and output a **Score** in the format: Rationale: xxxx; Score: [0-5].

### Example:
**Input:**
- **Image**: (An image contains two views will be provided)
- **Description**: "A brown armchair facing a wooden desk."
- **Additional Information**: xxxx (Reasons for choosing the images or other additional information)

**Output:**
1. **Description Analysis**: The description mentions a *brown armchair* and a *wooden desk*, with a spatial relation (*facing*).
2. **Image Observation**:
   - The image contains **three brown armchairs**.
   - **One red armchair does not match the description**.
   - **One brown armchair is not facing the desk, making it irrelevant**.
   - **However, there are still two brown armchairs facing desks, meaning the description lacks uniqueness**.
3. **Final Judgment**: 
    Rationale: there are still two brown armchairs facing desks, meaning the description lacks uniqueness.
    Score: 1

### Scoring Criteria:
- **0-2**: No object matches the description.
- **2-5**: Multiple objects match; uniqueness is unclear.
- **6-8**: One object primarily matches, but ambiguity remains.
- **9-10**: Only one object clearly matches the description.

### Key Considerations:
- Consider object attributes, spatial relations, and context.
- Be strict in determining uniqueness. If one of the two views detects a non-unique situation, it will be scored as a non-unique low score.
- The **Rationale** must always be in the second to last line.
- The **Score** must always be at the last line.
"""

# ==== refines ===

SYSTEM_TEXT_REFER_JUDGE = """
### Task: 
Task:
You are an AI judge. Given a JSON message, integrate all information and evaluate the final score.

Introduction:
Evaluate the final score based on four key information: "Distinguishability", "Ambiguity", "Logical" and "Consistency".
The **RATIONALE MESSAGES** in the JSON message can be used as auxiliary information for reference

Scoring Basis:
1. Distinguishability:
   - a dictionary with format of {k, v}. Both k, v has Score range: 0-10.
   - A positive sample requires both k and v to be high (v ≥ 6).
   - If k is high but v is low (v < 5) or null, it's a negative sample.

2. Ambiguity:
   - Score range: 0-10.
   - Focus on high values (≥7). Multiple large values (≥7) indicate issues.
   - More than 3 values ≥7 mean a low score.

3. Logical:
   - If "Logical" is low, the score is low (0).
   
4. Consistency:
   - Score range: 0-10.
   - If the score is low (<3), then this is a bad sign in logical.

Scoring Criteria:
- 1-2:
  - No positive samples in "Distinguishability" / More than 3 values ≥7 in "Ambiguity" / "Logical" is low
- 3-5:
  - No clear positive samples in "Distinguishability", but moderate scores exist (like 5:6).
  - "Ambiguity" has high values (6-10), but few 9-10 values (one or two bad sign). 
- 6-7:
  - Obvious at least one positive samples in "Distinguishability" (like 9:10), and at most one 10 in "Ambiguity".
- 8-9:
  - Obvious multiple positive samples in "Distinguishability" (like 9:10), and no multiple large values in "Ambiguity".


Additional Rules:
- If "Dense" and "Complex" are both ≤2, relax the "Distinguishability" evaluation (6:6 can be positive).
- A long sentence (>30 words) can increase the score in 1-2 points.
- A low "Consistency" (<3) score can decrease the score in 2 points.

Example 1:
Input:
{
  "JUDGE MESSAGES": {
    "Distinguishability": [{10: 4}, {4: null}, {9: 2}],
    "Ambiguity": [4, 10, 4, 5],
    "Serious Logical Error": 0,
    "Consistency": 9
  },
  "BASIC MESSAGES": {
    "Sentence": "This chair is placed near a table. Another same type of chair is besides this chair.",
    "Dense": 4,
    "Complex": 8
  }
  'RATIONALE MESSAGES': {
            'Existence': xxx,
            'Uniqueness 1': xxx,
            'Uniqueness 2': xxx,
            'Logic': xxx,
            'Consistency': xxx,
        }
}
Output:
- "Distinguishability": No positive samples.
- "Ambiguity": One '10', low high-value proportion.
- "Serious Logical Error": 0
- "Consistency": 9 good in logical

- Final Judgment: 
  Rationale: A brief summary
  Score: 2

Example 2:
Input:
{
  "JUDGE MESSAGES": {
    "Distinguishability": [{8: 10}, {5: null}, {1: null}],
    "Ambiguity": [1, 3, 1, 4],
    "Logical": 10
    "Consistency": 8
  }
}
Output:
- "Distinguishability": Positive sample (8:10).
- "Ambiguity": No critical high values.
- "Logical": 10, "Consistency": 8   good in logical
- Final Judgment:
  Rationale: A brief summary
  Score: 9

### Note:
- The Final Judgment **Score** must always be at the last line.
- "Rationale: A brief summary" must always be at the second last line.
"""
